484 research outputs found

    AI-generated Content for Various Data Modalities: A Survey

    Full text link
    AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms. Due to its wide range of applications and the demonstrated potential of recent works, AIGC developments have been attracting lots of attention recently, and AIGC methods have been developed for various data modalities, such as image, video, text, 3D shape (as voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human avatar (body and head), 3D motion, and audio -- each presenting different characteristics and challenges. Furthermore, there have also been many significant developments in cross-modality AIGC methods, where generative methods can receive conditioning input in one modality and produce outputs in another. Examples include going from various modalities to image, video, 3D shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar), and audio modalities. In this paper, we provide a comprehensive review of AIGC methods across different data modalities, including both single-modality and cross-modality methods, highlighting the various challenges, representative works, and recent technical directions in each setting. We also survey the representative datasets throughout the modalities, and present comparative results for various modalities. Moreover, we also discuss the challenges and potential future research directions

    Distribution-Aligned Diffusion for Human Mesh Recovery

    Full text link
    Recovering a 3D human mesh from a single RGB image is a challenging task due to depth ambiguity and self-occlusion, resulting in a high degree of uncertainty. Meanwhile, diffusion models have recently seen much success in generating high-quality outputs by progressively denoising noisy inputs. Inspired by their capability, we explore a diffusion-based approach for human mesh recovery, and propose a Human Mesh Diffusion (HMDiff) framework which frames mesh recovery as a reverse diffusion process. We also propose a Distribution Alignment Technique (DAT) that infuses prior distribution information into the mesh distribution diffusion process, and provides useful prior knowledge to facilitate the mesh recovery task. Our method achieves state-of-the-art performance on three widely used datasets. Project page: https://gongjia0208.github.io/HMDiff/.Comment: Accepted to ICCV 202

    System-status-aware Adaptive Network for Online Streaming Video Understanding

    Full text link
    Recent years have witnessed great progress in deep neural networks for real-time applications. However, most existing works do not explicitly consider the general case where the device's state and the available resources fluctuate over time, and none of them investigate or address the impact of varying computational resources for online video understanding tasks. This paper proposes a System-status-aware Adaptive Network (SAN) that considers the device's real-time state to provide high-quality predictions with low delay. Usage of our agent's policy improves efficiency and robustness to fluctuations of the system status. On two widely used video understanding tasks, SAN obtains state-of-the-art performance while constantly keeping processing delays low. Moreover, training such an agent on various types of hardware configurations is not easy as the labeled training data might not be available, or can be computationally prohibitive. To address this challenging problem, we propose a Meta Self-supervised Adaptation (MSA) method that adapts the agent's policy to new hardware configurations at test-time, allowing for easy deployment of the model onto other unseen hardware platforms.Comment: Accepted to CVPR 202

    ERA: Expert Retrieval and Assembly for Early Action Prediction

    Full text link
    Early action prediction aims to successfully predict the class label of an action before it is completely performed. This is a challenging task because the beginning stages of different actions can be very similar, with only minor subtle differences for discrimination. In this paper, we propose a novel Expert Retrieval and Assembly (ERA) module that retrieves and assembles a set of experts most specialized at using discriminative subtle differences, to distinguish an input sample from other highly similar samples. To encourage our model to effectively use subtle differences for early action prediction, we push experts to discriminate exclusively between samples that are highly similar, forcing these experts to learn to use subtle differences that exist between those samples. Additionally, we design an effective Expert Learning Rate Optimization method that balances the experts' optimization and leads to better performance. We evaluate our ERA module on four public action datasets and achieve state-of-the-art performance.Comment: Accepted to ECCV 202

    Heatmap Distribution Matching for Human Pose Estimation

    Full text link
    For tackling the task of 2D human pose estimation, the great majority of the recent methods regard this task as a heatmap estimation problem, and optimize the heatmap prediction using the Gaussian-smoothed heatmap as the optimization objective and using the pixel-wise loss (e.g. MSE) as the loss function. In this paper, we show that optimizing the heatmap prediction in such a way, the model performance of body joint localization, which is the intrinsic objective of this task, may not be consistently improved during the optimization process of the heatmap prediction. To address this problem, from a novel perspective, we propose to formulate the optimization of the heatmap prediction as a distribution matching problem between the predicted heatmap and the dot annotation of the body joint directly. By doing so, our proposed method does not need to construct the Gaussian-smoothed heatmap and can achieve a more consistent model performance improvement during the optimization of the heatmap prediction. We show the effectiveness of our proposed method through extensive experiments on the COCO dataset and the MPII dataset.Comment: NeurIPS 202

    DiffPose: Toward More Reliable 3D Pose Estimation

    Full text link
    Monocular 3D human pose estimation is quite challenging due to the inherent ambiguity and occlusion, which often lead to high uncertainty and indeterminacy. On the other hand, diffusion models have recently emerged as an effective tool for generating high-quality images from noise. Inspired by their capability, we explore a novel pose estimation framework (DiffPose) that formulates 3D pose estimation as a reverse diffusion process. We incorporate novel designs into our DiffPose to facilitate the diffusion process for 3D pose estimation: a pose-specific initialization of pose uncertainty distributions, a Gaussian Mixture Model-based forward diffusion process, and a context-conditioned reverse diffusion process. Our proposed DiffPose significantly outperforms existing methods on the widely used pose estimation benchmarks Human3.6M and MPI-INF-3DHP. Project page: https://gongjia0208.github.io/Diffpose/.Comment: Accepted to CVPR 202

    Progressive Channel-Shrinking Network

    Get PDF
    Currently, salience-based channel pruning makes continuous breakthroughs in network compression. In the realization, the salience mechanism is used as a metric of channel salience to guide pruning. Therefore, salience-based channel pruning can dynamically adjust the channel width at run-time, which provides a flexible pruning scheme. However, there are two problems emerging: a gating function is often needed to truncate the specific salience entries to zero, which destabilizes the forward propagation; dynamic architecture brings more cost for indexing in inference which bottlenecks the inference speed. In this paper, we propose a Progressive Channel-Shrinking (PCS) method to compress the selected salience entries at run-time instead of roughly approximating them to zero. We also propose a Running Shrinking Policy to provide a testing-static pruning scheme that can reduce the memory access cost for filter indexing. We evaluate our method on ImageNet and CIFAR10 datasets over two prevalent networks: ResNet and VGG, and demonstrate that our PCS outperforms all baselines and achieves state-of-the-art in terms of compression-performance tradeoff. Moreover, we observe a significant and practical acceleration of inference. The code will be released upon acceptance.Ministry of Education (MOE)National Research Foundation (NRF)This work is supported by MOE AcRF Tier 2 (Proposal ID: T2EP20222-0035), National Research Foundation Singapore under its AI Singapore Programme (AISG-100E-2020-065), and SUTD SKI Project (SKI 2021 02 06). This work is also supported by TAILOR, a project funded by EU Horizon 2020 research and innovation programme under GA No 952215

    Human ISL1+ ventricular progenitors self-assemble into an in vivo functional heart patch and preserve cardiac function post infarction

    Get PDF
    The generation of human pluripotent stem cell (hPSC)-derived ventricular progenitors and their assembly into a 3-dimensional in vivo functional ventricular heart patch has remained an elusive goal. Herein, we report the generation of an enriched pool of hPSC-derived ventricular progenitors (HVPs), which can expand, differentiate, self-assemble, and mature into a functional ventricular patch in vivo without the aid of any gel or matrix. We documented a specific temporal window, in which the HVPs will engraft in vivo. On day 6 of differentiation, HVPs were enriched by depleting cells positive for pluripotency marker TRA-1-60 with magnetic-activated cell sorting (MACS), and 3 million sorted cells were sub-capsularly transplanted onto kidneys of NSG mice where, after 2 months, they formed a 7 mm x 3 mm x 4 mm myocardial patch resembling the ventricular wall. The graft acquired several features of maturation: expression of ventricular marker (MLC2v), desmosomes, appearance of T-tubule-like structures, and electrophysiological action potential signature consistent with maturation, all this in a non-cardiac environment. We further demonstrated that HVPs transplanted into un-injured hearts of NSG mice remain viable for up to 8 months. Moreover, transplantation of 2 million HVPs largely preserved myocardial contractile function following myocardial infarction. Taken together, our study reaffirms the promising idea of using progenitor cells for regenerative therapy.ERC AdG743225Swedish Research Council Distinguished Professor Grant Dnr 541-2013-8351The Knut and Alice Wallenberg Foundation (KAW Dnr 2013.0028)Horizon 2020 research and innovation programme grant agreement No 647714Publishe

    Search for new phenomena in final states with an energetic jet and large missing transverse momentum in pp collisions at √ s = 8 TeV with the ATLAS detector

    Get PDF
    Results of a search for new phenomena in final states with an energetic jet and large missing transverse momentum are reported. The search uses 20.3 fb−1 of √ s = 8 TeV data collected in 2012 with the ATLAS detector at the LHC. Events are required to have at least one jet with pT > 120 GeV and no leptons. Nine signal regions are considered with increasing missing transverse momentum requirements between Emiss T > 150 GeV and Emiss T > 700 GeV. Good agreement is observed between the number of events in data and Standard Model expectations. The results are translated into exclusion limits on models with either large extra spatial dimensions, pair production of weakly interacting dark matter candidates, or production of very light gravitinos in a gauge-mediated supersymmetric model. In addition, limits on the production of an invisibly decaying Higgs-like boson leading to similar topologies in the final state are presente
    corecore